Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The complex task of choosing a de novo assembly: lessons from fungal genomes.

Identifieur interne : 001A54 ( Main/Exploration ); précédent : 001A53; suivant : 001A55

The complex task of choosing a de novo assembly: lessons from fungal genomes.

Auteurs : Juan Esteban Gallo [Colombie] ; José Fernando Mu Oz [Colombie] ; Elizabeth Misas [Colombie] ; Juan Guillermo Mcewen [Colombie] ; Oliver Keatinge Clay [Colombie]

Source :

RBID : pubmed:25262360

Descripteurs français

English descriptors

Abstract

Selecting the values of parameters used by de novo genomic assembly programs, or choosing an optimal de novo assembly from several runs obtained with different parameters or programs, are tasks that can require complex decision-making. A key parameter that must be supplied to typical next generation sequencing (NGS) assemblers is the k-mer length, i.e., the word size that determines which de Bruijn graph the program should map out and use. The topic of assembly selection criteria was recently revisited in the Assemblathon 2 study (Bradnam et al., 2013). Although no clear message was delivered with regard to optimal k-mer lengths, it was shown with examples that it is sometimes important to decide if one is most interested in optimizing the sequences of protein-coding genes (the gene space) or in optimizing the whole genome sequence including the intergenic DNA, as what is best for one criterion may not be best for the other. In the present study, our aim was to better understand how the assembly of unicellular fungi (which are typically intermediate in size and complexity between prokaryotes and metazoan eukaryotes) can change as one varies the k-mer values over a wide range. We used two different de novo assembly programs (SOAPdenovo2 and ABySS), and simple assembly metrics that also focused on success in assembling the gene space and repetitive elements. A recent increase in Illumina read length to around 150 bp allowed us to attempt de novo assemblies with a larger range of k-mers, up to 127 bp. We applied these methods to Illumina paired-end sequencing read sets of fungal strains of Paracoccidioides brasiliensis and other species. By visualizing the results in simple plots, we were able to track the effect of changing k-mer size and assembly program, and to demonstrate how such plots can readily reveal discontinuities or other unexpected characteristics that assembly programs can present in practice, especially when they are used in a traditional molecular microbiology laboratory with a 'genomics corner'. Here we propose and apply a component of a first pass validation methodology for benchmarking and understanding fungal genome de novo assembly processes.

DOI: 10.1016/j.compbiolchem.2014.08.014
PubMed: 25262360


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The complex task of choosing a de novo assembly: lessons from fungal genomes.</title>
<author>
<name sortKey="Gallo, Juan Esteban" sort="Gallo, Juan Esteban" uniqKey="Gallo J" first="Juan Esteban" last="Gallo">Juan Esteban Gallo</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá</wicri:regionArea>
<wicri:noRegion>Bogotá</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Mu Oz, Jose Fernando" sort="Mu Oz, Jose Fernando" uniqKey="Mu Oz J" first="José Fernando" last="Mu Oz">José Fernando Mu Oz</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín</wicri:regionArea>
<wicri:noRegion>Medellín</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Misas, Elizabeth" sort="Misas, Elizabeth" uniqKey="Misas E" first="Elizabeth" last="Misas">Elizabeth Misas</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín</wicri:regionArea>
<wicri:noRegion>Medellín</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Mcewen, Juan Guillermo" sort="Mcewen, Juan Guillermo" uniqKey="Mcewen J" first="Juan Guillermo" last="Mcewen">Juan Guillermo Mcewen</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine, Universidad de Antioquia, Medellín, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine, Universidad de Antioquia, Medellín</wicri:regionArea>
<wicri:noRegion>Medellín</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Clay, Oliver Keatinge" sort="Clay, Oliver Keatinge" uniqKey="Clay O" first="Oliver Keatinge" last="Clay">Oliver Keatinge Clay</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine and Health Sciences, Universidad del Rosario, Bogotá, Colombia. Electronic address: oliver.clay@gmail.com.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine and Health Sciences, Universidad del Rosario, Bogotá</wicri:regionArea>
<wicri:noRegion>Bogotá</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:25262360</idno>
<idno type="pmid">25262360</idno>
<idno type="doi">10.1016/j.compbiolchem.2014.08.014</idno>
<idno type="wicri:Area/PubMed/Corpus">001837</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001837</idno>
<idno type="wicri:Area/PubMed/Curation">001837</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001837</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001713</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001713</idno>
<idno type="wicri:Area/Ncbi/Merge">000F12</idno>
<idno type="wicri:Area/Ncbi/Curation">000F12</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000F12</idno>
<idno type="wicri:Area/Main/Merge">001A59</idno>
<idno type="wicri:Area/Main/Curation">001A54</idno>
<idno type="wicri:Area/Main/Exploration">001A54</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">The complex task of choosing a de novo assembly: lessons from fungal genomes.</title>
<author>
<name sortKey="Gallo, Juan Esteban" sort="Gallo, Juan Esteban" uniqKey="Gallo J" first="Juan Esteban" last="Gallo">Juan Esteban Gallo</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá</wicri:regionArea>
<wicri:noRegion>Bogotá</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Mu Oz, Jose Fernando" sort="Mu Oz, Jose Fernando" uniqKey="Mu Oz J" first="José Fernando" last="Mu Oz">José Fernando Mu Oz</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín</wicri:regionArea>
<wicri:noRegion>Medellín</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Misas, Elizabeth" sort="Misas, Elizabeth" uniqKey="Misas E" first="Elizabeth" last="Misas">Elizabeth Misas</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín</wicri:regionArea>
<wicri:noRegion>Medellín</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Mcewen, Juan Guillermo" sort="Mcewen, Juan Guillermo" uniqKey="Mcewen J" first="Juan Guillermo" last="Mcewen">Juan Guillermo Mcewen</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine, Universidad de Antioquia, Medellín, Colombia.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine, Universidad de Antioquia, Medellín</wicri:regionArea>
<wicri:noRegion>Medellín</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Clay, Oliver Keatinge" sort="Clay, Oliver Keatinge" uniqKey="Clay O" first="Oliver Keatinge" last="Clay">Oliver Keatinge Clay</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine and Health Sciences, Universidad del Rosario, Bogotá, Colombia. Electronic address: oliver.clay@gmail.com.</nlm:affiliation>
<country xml:lang="fr">Colombie</country>
<wicri:regionArea>Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; School of Medicine and Health Sciences, Universidad del Rosario, Bogotá</wicri:regionArea>
<wicri:noRegion>Bogotá</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Computational biology and chemistry</title>
<idno type="eISSN">1476-928X</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Benchmarking</term>
<term>Contig Mapping (statistics & numerical data)</term>
<term>DNA, Intergenic</term>
<term>Genome, Fungal</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Open Reading Frames</term>
<term>Paracoccidioides (genetics)</term>
<term>Repetitive Sequences, Nucleic Acid</term>
<term>Sequence Analysis, DNA (statistics & numerical data)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN intergénique</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Cadres ouverts de lecture</term>
<term>Cartographie de contigs ()</term>
<term>Génome fongique</term>
<term>Paracoccidioides (génétique)</term>
<term>Référenciation</term>
<term>Séquences répétées d'acides nucléiques</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
<keywords scheme="MESH" type="chemical" xml:lang="en">
<term>DNA, Intergenic</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Paracoccidioides</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Paracoccidioides</term>
</keywords>
<keywords scheme="MESH" qualifier="statistics & numerical data" xml:lang="en">
<term>Contig Mapping</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Benchmarking</term>
<term>Genome, Fungal</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Open Reading Frames</term>
<term>Repetitive Sequences, Nucleic Acid</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>ADN intergénique</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Cadres ouverts de lecture</term>
<term>Cartographie de contigs</term>
<term>Génome fongique</term>
<term>Référenciation</term>
<term>Séquences répétées d'acides nucléiques</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Selecting the values of parameters used by de novo genomic assembly programs, or choosing an optimal de novo assembly from several runs obtained with different parameters or programs, are tasks that can require complex decision-making. A key parameter that must be supplied to typical next generation sequencing (NGS) assemblers is the k-mer length, i.e., the word size that determines which de Bruijn graph the program should map out and use. The topic of assembly selection criteria was recently revisited in the Assemblathon 2 study (Bradnam et al., 2013). Although no clear message was delivered with regard to optimal k-mer lengths, it was shown with examples that it is sometimes important to decide if one is most interested in optimizing the sequences of protein-coding genes (the gene space) or in optimizing the whole genome sequence including the intergenic DNA, as what is best for one criterion may not be best for the other. In the present study, our aim was to better understand how the assembly of unicellular fungi (which are typically intermediate in size and complexity between prokaryotes and metazoan eukaryotes) can change as one varies the k-mer values over a wide range. We used two different de novo assembly programs (SOAPdenovo2 and ABySS), and simple assembly metrics that also focused on success in assembling the gene space and repetitive elements. A recent increase in Illumina read length to around 150 bp allowed us to attempt de novo assemblies with a larger range of k-mers, up to 127 bp. We applied these methods to Illumina paired-end sequencing read sets of fungal strains of Paracoccidioides brasiliensis and other species. By visualizing the results in simple plots, we were able to track the effect of changing k-mer size and assembly program, and to demonstrate how such plots can readily reveal discontinuities or other unexpected characteristics that assembly programs can present in practice, especially when they are used in a traditional molecular microbiology laboratory with a 'genomics corner'. Here we propose and apply a component of a first pass validation methodology for benchmarking and understanding fungal genome de novo assembly processes.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Colombie</li>
</country>
</list>
<tree>
<country name="Colombie">
<noRegion>
<name sortKey="Gallo, Juan Esteban" sort="Gallo, Juan Esteban" uniqKey="Gallo J" first="Juan Esteban" last="Gallo">Juan Esteban Gallo</name>
</noRegion>
<name sortKey="Clay, Oliver Keatinge" sort="Clay, Oliver Keatinge" uniqKey="Clay O" first="Oliver Keatinge" last="Clay">Oliver Keatinge Clay</name>
<name sortKey="Mcewen, Juan Guillermo" sort="Mcewen, Juan Guillermo" uniqKey="Mcewen J" first="Juan Guillermo" last="Mcewen">Juan Guillermo Mcewen</name>
<name sortKey="Misas, Elizabeth" sort="Misas, Elizabeth" uniqKey="Misas E" first="Elizabeth" last="Misas">Elizabeth Misas</name>
<name sortKey="Mu Oz, Jose Fernando" sort="Mu Oz, Jose Fernando" uniqKey="Mu Oz J" first="José Fernando" last="Mu Oz">José Fernando Mu Oz</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A54 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001A54 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:25262360
   |texte=   The complex task of choosing a de novo assembly: lessons from fungal genomes.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:25262360" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021